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Target for Antiviral Therapy 

The present in\ ention pro\ ides a cr\ slallised module of a nuclear phosphoproiein and 
an assay and method of determining interactions with human papillomavirus E2 for 
5 use in drug design, for use particularly but not exclusively, in designing antiviral 

- agents with potetTthrfuse in treating -warts,- protifcrtiti\^-sktn4estQn s and -eafcinema of — 

the cervix. 

Background to the Invention 

10 

Human papillomaviruses (HPVs) cause warts and proliferative lesions in skin and 
other epithelia. In a minority of HPV types (*^high risk'\ which include HPVs 16, 18, 
3 1, 33, 45 and 56), further transformation of the wart lesions can produce tumours, 
most notably carcinoma of the cervix V HPVs have evolved a sophisticated system of 
15 control, mediated by protein:DNA and protein:protein interactions, that involves both 
cellular and viral proteins. The 45 kDalton nuclear phosphoprotein, E2, has two 
central roles in this control. It acts as the principal virally encoded transcription 
factor and, in association with the viral El protein, it creates the molecular complex 
at the origin of the viral DNA replication^. 

20 

E2 has three distinct modules. The N-terminal module (E2NT) of about 200 amino 
acids is responsible for interactions with viral and host cell transcription factors. It is 
followed by a flexible, proline-rich, linker module and a C-terminal module (E2CT), 
each of about 100 amino acids (Fig. la). The E2CT binds as a homodimer to DNA 
25 sites with a consensus sequence of ACCGN4CGGT In most HPVs a long upstream 
regulatory region (URR) precedes the viral genes and contains four spatially 
conserved E2 binding sites: three sites proximal to the transcription start site (p97 in 



the function of the E2N T is to bind and localise at least three cellular transcription 
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factors, Spl, TFIIB and AMF-1, to the transcription initiation complex. In addition, 
E2 interacts with another viral protein, El, which has ATPase and helicase activities. 
El itself binds to the viral origin of replication which consists of about 100 bp and is 
surrounded by the three E2-binding sites, proximal to the transcription start. The 
E2:E1 interaction greatly increases the rate of HPV genome replication' ' ^ Fig. la. 
An intact E2 is essential for the normal productive (wart) life cycle of HPV, however 
during malignant progression HPV DNA is integrated into the host cell genome, 
which usually results in disruption of the E2/E1 ORFs and loss of E2 protein, in turn 
leading to dysregulated expression of the viral oncogenes E6 and E7^. 




Consistent with its role as a transcription regulator, E2 has been shown to direct the 
formation of loops in DNA containing E2 binding sites^ The loops were only 
formed with intact E2, and not with the E2CT alone. The E2 binding sites did not 
function independently and their co-operative effect was mediated by full length E2, 
1 5 leading the authors to suggest that there were specific interactions mediated by E2 
that bridged across the set of DNA binding sites through its N-terminal. A similar 
DNA loop structure could also be achieved with Spl, a cellular transcription factor, 
which forms a complex with distally bound E2 ^; Spl/E2 interactions are critical for 
transcription activation in BPV'^. 

20 

Eighty six known E2 proteins from different species and different human subtypes*^ 
are highly conserved, with sequence identities typically of 35% in the N and C- 
terminal modules (Fig. lb). The crystal structure of the E2CT has been determined 
both alone and in complex with cognate DNA*^'^'*. The module is a dimer with a 
25 barrel fold, and induces substantial bending (42-44°) of the DNA from its B-form 
double helix^"*. 

The structure of the proteolytic fragment of HPV 18 E2NT, missing 65 N-terminal 
residues, was recently reported at 2.1 A spacing^'. This allowed some analysis of 
30 mutational effects on function, although the missing 65 amino acids contain residues 
which are essential for the transcriptional and replication activities of the protein. 
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We report herein the structure of the complete E2NT determined b>' X-ray analysis at 
1.9 A. \V'e have found that it is an L-shaped molecule with the residues vital for 
transcriptional and replication activities of the protein lying on opposite sides of the 
N-terminal domain. Surprisingly, our results show that the surface, vital for 
5 transcription activation, is in fact involved in association of two E2NT's into a dimer. 
We suggest that dimerisation of E2NT plays an important and key role in induction 
oTDNA loop formation, the mechanism by which distally bound transcription factors 
would be brought close to the site of transcription initiation. More importantly, our 
results raise the possibility that dimer formation serves as a molecular switch 
10 between early gene expression and viral genome replication during HPV infection. 

Statement of the Invention 

According to a first aspect of the invention there is provided a crystallised molecular 
1 5 complex of an E2 N-terminal module (E2NT) dimer protein or homologue thereof, 
comprising residues vital for transcriptional and replicational activities of said 
protein lying on opposite sides of an N-terminal domain, for use in rationalised drug 
design. 

20 Preferably the E2NT dimer protein is substantially as depicted in any of Figures 2c 
and/or 3a-d. 

According to a second aspect of the invention there is provided an in vitro method for 
identifying and, 'or selecting a candidate therapeutic agent, the method comprising 
25 determining interaction of a E2 N-temiinal module (E2NT) dimer in a sample by 
contacting said sample with said candidate therapeutic agent and measuring DNA 
loop formation. 



Preferably, the candidate therapeutic agent interferes or blocks interactions of E2NT 
so as to interfere or block viral and/or cellular transcription factors. 
According to a third aspect of the invention there is provided use of an E2NT dimer 
in the preparation of a medicament for use in treating vvarts, proliferative skin lesions 
and/or cervical cancer. 




According to a fourth aspect of the invention there is provided a method of 
monitoring the efficacy of an antiviral therapy in a patient receiving a medicament for 
the treatment of warts, proliferative skin lesions and/or cervical cancer comprising 
10 taking a sample from said patient and measuring E2NT interactions and/or DNA loop 
formation. 

Thus it will be appreciated that a patient can be monitored at the start of therapy to 
test its effectiveness. Alternatively, a patient can be monitored once a therapy has 
15 been established so as to monitor its efficacy with a view to altering a therapy if 
found to be unsatisfactory. 

The human papillomavirus E2 protein controls the primary transcription and 
replication of the viral genome. Both activities are govemed by a -200 amino acid 

20 N-terminal module (E2NT) which is connected to a DNA binding C-terminal module 
by a flexible linker. The crystal structure of the E2NT module from high-risk type 16 
human papillomavirus reveals an L-shaped molecule with two closely packed 
domains, each with a novel fold. It forms a dimer in the crystal and in solution. The 
dimer structure is important in the interactions of E2NT with viral and cellular 

25 transcription factors and is the key to induction of DNA loops by E2. These loops 

may serve to target distal DNA-binding transcription factors to the region proximal to 
the start of transcription. The structure has implications for antiviral drug design and 
cervical cancer therapy. 
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Detailed Description of the Invention 

The invention will now be described by way of example only with reference to the 
following Figures and Tables wherein: 

Table 1 illustrates X-ray data and phasing statistics; 



Table 2 illustrates refinement and model correlation; 

10 Figure la represents functional assignments of HPV 16 E2 protein; 

Figure lb, illustrates sequence alignment of E2NT modules from a subset of HPV 
types; 

1 5 Figure 2a illustrates a stereo view of electron density with a final model at the dimer 
interface of the E2NT module, viewed down the crystallographic two-fold axis; 

Figure 2b represents a stereo ribbon diagram of the E2NT module; 

20 Figure 2c represents the E2NT dimer; 

Figure 3a illustrates a schematic view of URR; 

Figure 3 b illustrates a schematic view of loop formation induced by binding of E2 
25 proteins to two cognate sites; 

Figure 3c illustrates a model of E2 dimer formation; 



l igurc 4a illustrates the distribution ot conscr\ed residues on the h2N 1 monomer; 




Figure 4b illustrates a first cluster of conserved residues on the E2NT monomer; 

Figure 4c illustrates a second cluster of conserved residues on the E2NT monomer; 
and 

5 

Figure 4d illustrates conserved residues Gin 12 and Glu39. 



With reference to Figure la and functional assignments of E2. There is shown in a 
schematic view of NT, linker and CT modules of E2 indicating known functions of 

10 each module. Amino acid numbers which delimit the modules corresponds to E2 
from HPV16. In Figure lb, there is shown the sequence alignment of the E2NT 
modules from a subset of HPV types (HPV16, HPVl 8, HP VI 1 and HPV2a) and one 
BPV type. Shaded blocks above the alignment indicate the experimentally 
determined secondary structure. Shaded blocks below the sequences indicate the 

15 minimal peptide sequences involved in protein:protein interactions, suggested by 
mutation studies. Residues with more than 90% identity among 86 PV types are 
coloured: red for internal structural residues, green for residues within the fulcrum 
region, blue for surface residues. 

20 With reference to the structural features of E2, in Figure 2a there is shown a stereo 
view of the electron density with the final model, at the dimer interface of the E2NT 
module, viewed down the crystallographic two-fold axis. The likelihood weighted 
map is contoured at the 1.5 a level. Ribbons of two independent monomers are 
coloured blue and yellow. Side chains of ARG37 and Ile73 which are known to be 

25 critical for transactivation "^'^^ , are shown in dark green; side chain of other residues 
at the dimer interface are shown in light green. Oxygen atoms are in red, nitrogen in 
blue, water molecules are shown as orange spheres and hydrogen bonds as dashed 
sticks. In Figure 2b, there is shown a stereo ribbon diagram of the E2NT module. 
The Nl domain is show^n in aquamarine and the N2 domain in pink, with the fulcrum 

30 in green. In Figure 2c, there is shown the dimer of E2NT, showing the extent of the 
interface between the two subunits. The view is as in Figure 2a but rotated clockwise 
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b\ 90°. Side chains of Glnl2 and Glu39 which arc critical for interactions with El 

are shown in magenta. Side chains of residues at the dimer interface are 
coloured as per Figure 2a. 

5 With reference to Figures 3a-d there is shown loop formation in the URR of HPV16. 
In Figure 3a, there is show^n a schematic view of the URR. The four E2-binding sites 
are represented by boxes. Numbers in italics indicate distances between individual 
sites upstream of the p97 promoter. Two possible E2 configurations, with separate or 
dimeric E2NT modules are shown. In Figure 3b, there is shown a schematic view of 

10 loop formation induced by binding of E2 proteins to two cognate sites, based on the 
experiments reported by Knight et af. In Figure 3d, there is showT. the possible DNA 
loops within the URR as depicted in Figure 3b. In Figure 3c, there is shown a model 
of the formation of E2 dimers, showing interactions between both the C-terminal and 
E2NT modules. The C-terminal dimer, with its bound DNA, is based on the crystal 

15 structure of this module^^. The E2NT dimer is proposed from the present work. The 
relative orientation and position of the E2NT and C-terminal modules is purely 
schematic. 

With reference to Figures 4a-d there is shown functionally important residues. In 
20 Figure 4a, there is shown the distribution of conserved residues on the E2NT 

monomer. In Figures 4b and 4c there is shown the two clusters of conserved residues 
in the fulcrum of E2NT. In Figure 4d, there is shown conserved residues Glnl2 and 
Glu39. Bonds in ball-and stick models are coloured aquamarine (Nl domain), pink 
(N2 domain) and green (fulcrum). Hydrogen bonds are shown as dashed lines, water 
25 molecules as orange spheres, oxygen atoms are in red, nitrogen atoms in blue and 
sulphur atoms in yellow. 

There is convincing evidence that the E2 protein has an extended structure, is flexible 



problem is the extended tlcxibic Hnkcr module, wuh around 100 residues. E2N 1 



proved difficult to crystallise, and a number of different constructs were made and 
overexpressed before crystallisation with residues 1 to 201 was achieved, but even 
this construct possessed limited stability. The protein had to be crystallised within 2- 
3 days of purification; crystals grew within about 48 hours but only retained useful 
diffraction quality for a further 2-3 days. This necessitated that crystals be rapidly 
vitrified in cryoprotectant buffer and stored for use as soon as detector time became 
available*^ 




Crystals of E2NT belong to the space group P3i21 with unit-cell dimensions 
a=b=54.3 A, c=155.5 A. The structure was determined using two heavy atom 
derivatives and refined with data extending to 1.9 A spacing (Fig. 2a). The main 
chain is well defined throughout with the exception of residues 125 and 126 which 
are in an exposed loop and are mobile. There was density for the last residue of the 
His-tag at the N-terminus, but none for the remainder of this entity. All amino acids 
lie in the allowed regions of the Ramachandran ((t),\|/) plot^^ with 92.4% in most 
favoured regions^ ^. 

The transactivation module is composed of two domains, Nl and N2, arranged so as 
to give it an overall L-shaped appearance. Analysis of the PDB^^using DALI^^shows 
that both have unique organisation of their secondary structures. Domain Nl, which 
forms the N-terminus of the intact E2, is composed of residues 1 to 92, which fold 
into three long a-helices. Figure 2 (b,c). There is a tight loop between al and a2 and 
a more extended one between o2 and a3. The three helices pack antiparallel to one 
another in the form of a twisted plane, with angles of about 20° and 25° between the 
pairs of consecutive helices. DALI indicated a maximum Z-score of 5.7, that could 
suggest a significant correlation, for colicin la, a membrane protein wWch contains 
three 80 A long a-helices arranged more or less coplanar^V This is the only other 
known protein that contains a true domain made up of such a packing of three 
helices. In addition there were 42 other structures which gave Z-scores above 4.0, 
most of which were four helix bundles, such as bacterioferritin^^. However, in these 
only two of the three Nl helices superimposed simultaneously on two, not always 
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adjacent, bundle helices as a result of a more planar arrangement of helices within 
Nl. The indications are that the similarities observed reflect the optimum stacking 
angle of antiparallel helices against one another rather than suggesting a common 
ancestor for the evolution of these molecules. 

5 

Domain N2 is made up of residues 110 to 201 and is composed almost entirely of 
antiparallel (3 structure, with only one short helical segment from residues 171 to 178, 
Figure 2 (b,c). The secondary' structure has two short three and four stranded 
antiparallel P pleated sheets interconnected by two stranded (3 ribbons. For this 
10 domain DALI failed to identify any significant homologies to known structures, with 

1 - .1 1 t:^^,^ ♦U^ ^«o1x/o^c rvf T-Torr^c: QnH Rntnhan and the 

present study, the N2 fold appears to be novel. 

The structure between the Nl and N2 domains (residues 93 to 109) contains two 
15 consecutive single turns of helical structure, resulting in a compact and tight turn. It 
packs closely against elements of both domains and is not a truly independent 
structural domain. Rather it forms a fulcrum in the L-shape formed by Nl and N2 
where it could act as a hinge, allowing the two domains to change their relative 
conformation in a specific way. Several of the interactions between adjacent regions 
20 of chain in the fulcrum are mediated indirectly through H-bonds involving water 
molecules, suggesting the possibility of flexibility. 

One of the most striking features of the crv'Stal structure is the association of two 
E2NT monomers into a tight dimer. The two E2NT monomers pack around the 
25 cr>'Stallographic 2-fold axis, as shown in Figure 2a. The dimer interface is formed 
mostly by amino acids from helices a2 and a3 of the Nl domain and by residues 
142-144 from the N2 domain. The total buried surface area between the two E2NT is 



In the E2NT dimer interface, each subunit contributes a cluster of seven equivalent 
residues, invariant or conserved in the 86 knov/n sequences of E2'\ with many direct 
and water-mediated hydrogen bonds and rather few non-polar contacts, Fig. 2. 
Analysis of the dimer forming surfaces shows that all the direct hydrogen bonds 
5 between monomers are made through these seven amino acids. For the invariant 
Arg37, all possible side-chain hydrogen bonds are made and all are well defined, 
Figure 2. Three of them are across the dimer interface. One hydrogen bond is 
critical, from NH2 to the main chain carbonyl oxygen of Leu77. A second hydrogen 
bond from NH2 is to OGl of ThrSI; in five out of 86 sequences this residue is 
10 glutamine, and modelling shows a hydrogen bond is possible to the NE of Arg37. 
The NHl of Arg71 H-bonds to the OEl of residue 80, which is Glu or Gin in all but 
six variants. At the NE of Arg37 there is an ideal H-bond to water that itself makes 
another strong H-bond across the dimer interface to the main-chain carbonyl oxygen 
of residue 142. The role of the invariant Ile73 is the filling of the intersubunit non- 
15 polar volume made up of the aliphatic parts of Arg37, Gln76 and of Leu77 - in this 
case from both monomers. The Leu77 is in a few sequences substituted by valine or 
isoleucine and in 9 out of 86 known sequences by methionine. Inspection of the 
structure shows that Leu77 is partially exposed to the solvent and therefore different 
hydrophobic side chains could be easily accommodated at this site. Another 
20 important non-polar side chain is Ala69. Its side chain methyl packs into the surface 
of the other monomer at van de Waals distance from the main chain of residue 142. 
The only observed mutation of Ala69 is to Gly, and is easily accommodated. Gln76 
is conserved or has homologous substitutions in about 2/3 of E2 sequences; in about 
1/4 of the sequences there is methionine or valine at this position* ^ Although 
25 hydrophobic substitutions of Gln76 would disrupt the hydrogen bonding to GluSO 
across the dimer interface, and to Arg37 from the same subunit, the hydrophobic side 
chain at residue 76 could instead make a compensating hydrophobic interaction with 
the adjacent intersubunit hydrophobic pocket formed by Ile73 and Leu77. 

Modelling of the amino acid variations in the 86 known papillomavirus E2 proteins 
30 into the other contacts at the dimer interface shows that they generally can be 
accommodated (data not shown). The consistency of the hydrogen bonds and van de 
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Waals contacts at the monomer-monomer interface in the various sequences suggests 
therefore that the E2NT dimer interactions are potentially present in all 
papillomaviruses. 

5 The first experimental evidence for the E2NT dimerisation in the presence of DNA 
with multiple E2-binding sites was provided by Knight et al in 1991^ Their studies 
showed that intact E2 led to the formation of DNA loops on templates with widely 
separated E2 binding sites, while a truncated E2, containing the DNA-binding E2CT 
but missing the N-terminal 161 residues, did not. Such dimerisation is further 
10 supported by the observed synergistic transcription activation by a complex of two 
DNA.-bound E2 dimers^^. 

To analyse the functional behaviour of the E2NT dimers further, we measured the 
dissociation constant by sedimentation equilibrium using analytical 

1 5 ultracentrifugation of recombinant E2NT protein containing the 201 N-terminal 
amino acids. A value of ATd = 8.1 ± 4 x 10"^ M was obtained, indicating medium- 
strength association. The micromolar range of the E2NT dimer is certainly 
physiologically significant, and compares well with values for other transcription 
factors which have relatively low dissociation constants, often with the values 

20 between 1 ^iM and 20 \xM In vivo, the interaction could be enhanced when the 
two E2NT modules are placed in close proximity. Indeed, E2CT forms dimers which 
bind to the multiple DNA-binding sites located within the URR of viral DNA with 
A'd of protein:DNA interactions usually in the nanomolar range^^\ Consequently, the 
local concentration of E2NT, bound to the E2CT via the non-conserved, flexible -80 

25 amino-acid linker, is effectively increased. 

E2NT dimer interactions, as seen in the crystal structure, could form either between 
modules which are alreadv part of a single F2 dimer. formed as a result of F?rT 

. rciwccu L\so piciorniuu Jinici;s lULaicu un uiilcrem i.^ iMiiuirm mlcs u- lu. 
The results of the electron microscop)' suggest that the latter dimerisation does 




occur . Although no direct experimental evidence exists for the former dimerisation, 
it does also seem possible due to the flexibility of the linker connecting the two 
modules. We propose that E2 molecules may initially keep their N-terminal modules 
within their internal dimers, but swap N-terminal modules and cross link to E2 
5 molecules bound to distant DNA binding sites to form active loop structures during 
transcriptional activation and / or HPV DNA replication (Figure 3d). As discussed 
below, the effects of mutations on transcriptional transactivation can be explained in 
terms of the dimer being an essential element in this process. 

10 E2 is a regulator of both transcription and viral DNA replication and thus interacts 
with other viral and host macromolecules in the infected cell. Indication of the 
possible importance of individual residues in the function comes firstly from the 
structure, secondly from the extensive set of sequences of the papillomaviral E2's 
and thirdly from mutagenesis studies on the individual proteins. In the following we 

1 5 make a primary attempt to map the molecule's function onto its structure. 

The pattern of amino acid conservation for the 86 available papilloma sequences^ ^ 
has been analysed using the GCG program suite^^. The sequences exhibit striking 
variation, characteristic of some virus families. However, 33 of the total 201 
residues in the E2NT construct were totally or highly conserved. Fig. 4a illustrates 

20 the distribution of these 33 residues in the dimer. These were categorised into two 
sets: those with an essentially structural role and those exposed on the surface with a 
potential for intermolecular interactions. Thirteen residues (Fig. lb) are buried or 
play a purely structural role within the monomer, they are not expected to be of 
functional importance and will not be discussed here. 

25 

A further 12 of these 33 residues stand out as having a structural role in the interface 
of the Nl and N2 domains. They form three clusters, the first making direct 
interactions between the two domains (Ile82, Glu90, Trp92, Lysl 12, Tyrl38, 
Vail 45) and two separate sets of interactions, one from N2 (Pro 106, Lysl 11, Phel68, 
30 Trpl34) and the other from Nl (Trp33, Leu94) to the structure connecting them, 

referred to here as a fulcrum. The first two clusters are shown in Figure 4 b, c and it 
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can be seen that Lysl 1 1 and Lysl 12 play key roles. Their side chains point in 
opposite directions to one another and their terminal amino groups are involved in 
near ideal patterns of hydrogen bonds. The flat surfaces of their extended side chains 
stack against Trpl 34 and Trp92, respectively. This clustering of invariant residues at 
5 the interface indicates a functional importance for the relative orientation of Nl and 
N2. The fulcrum could indeed provide a flexible pivot between the two domains, but 
there is no direct evidence for this as yet. Finally, while the side chain of Glu90 is 
held tightly in place by two H-bonds and could have a structural role, its 0E2 atom is 
exposed on the surface and is surrounded by near invariant side-chains, which may 
10 thus play a part in interactions with other molecules. 

Of the remaining eight conserved residues, mutational substitutions of Glu20, 
GlulOO and Asp 122 ^^'^^ had moderate effects on the transactivation and replication 
properties of E2, which depended on a particular viral strain. GIu20 lies on the top 
15 surface of Nl. Asp 122 lies far away on the distal surface of N2. GlulOO is 

completely exposed and points into the solvent at the junction of the L between the 
Nl and N2 domains. The functional role of these amino acids has yet to be clarified. 

Three conserved amino acids (Arg37, Glu39 and Ile73) have been subjected to point 
20 mutation and the effects on the two principal functions of E2, i.e. transactivation and 
HPV DNA replication have been assessed (reviewed in'*,also ^*'^'*'^^), Together with 
the remaining two conserved amino acids. Gin 12 and Ala69, these residues form two 
functionally important surfaces (see below). 

25 Finally, a number of the mutational results (reviewed in also -'^--^--j correspond to 
residues that can be assigned to structural roles. Substitution of these residues will 
lead to substantial conformational changes and a probable inability to fold correctly. 
This is particularly true for some of the deletion mutants involving the core of the 



The induction of DNA loops by E2NT dimerisation could be important for the 
construction of the active transcription bubble by targeting DNA-binding 
transcription factors, bound at distal sites, to the region proximal to the start of 
transcription ( reviewed in ^^). In support of this, residues Arg37, Ile73 and Gln76 
map onto the surface of E2NT involved in dimer formation, and mutations result in 
considerable disruption of transactivation, while having little effect on replication, 
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The structure also shows that Ala69 which points its side chain methyl across 
the dimer interface, is also critical for transactivation. Mutational substitutions to 
amino acids with longer side chains should have a knock out effect on E2NT dimer 
1 0 formation and consequently on transactivation. 



The sites of association with cellular transcription factors AMF-1 (residues 74-134) 
and TFIIB (134-216) were previously mapped onto the E2NT module (Figure 1) 
using a series of deletion mutants as well as point mutations"^"*'"^^. These sites were 
1 5 mutually exclusive. In the structure, residues 74-134 include the fulcrum, while 
residues 134-216 correspond to domain N2. Further biochemical and structural 
studies can now be planned to characterise these interactions in more detail. 

Replication of the viral genome is initiated by binding of another viral protein, El, to 
20 the origin of DNA replication"* which is itself flanked by two E2 binding sites. Fig. 
3a. While the function of E2CT dimers is to bind specifically to the DNA sites, 
E2NT interaction with El enhances the binding of El to this region. Mutational 
substitutions of Glu39 generally retained transcriptional activation while DNA 
replication was substantially reduced^ In the structure, the conserved Glu39 
25 makes every possible hydrogen bond by its side chain carboxyl oxygens, Fig. 4d. 

One hydrogen bond is to NE2 of Gin 12, which is absolutely conserved in all known 
sequences of E2. The other three hydrogen bonds are to the water molecules which 
are part of an intimate net of well-defined water molecules surrounding Glu39 and 
mediating its interactions with adjacent residues. Interestingly, a number of these 
30 protein interactions with water molecules are conserved as they are made to the 
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protein backbone, including carbonyl oxygens of Gin 12, Met36 and Lys68. While 
mutation of Gin 12 in BPVl only slightly affected both transactivation and 
replication, it substantially reduced cooperative origin binding^^'^l The close 
positioning of Glnl2 and Glu39 in the three-dimensional structure further enhances 
the notion that these two resides are involved in interactions with El . The conserved 
set of interactions at GInl2/Glu39 suggests that the main chain carbonyl oxygens of 
Gin 12 and Met36 and the conserved water molecules could be also involved in these 
interactions. Glnl2/Glu39 are surrounded by Leu8, lie 15, Met36, Tyr43, Gln57 and 
Lys68, which are unlikely to contribute into E2/E1 interactions, as these residues are 
not well conserved in E2 sequences from different papillomaviruses. 

The Glnl2/Glu39 cluster lies on a side of the Nl domain which is opposite to the 
side involved in transactivation (and dimerisation). Figure 2c. Notably, the spatial 
separation of the two functionally important surfaces suggests that E2NT module 
could be able to interact with El at the same time as it interacts through the 
dimerisation interface with another E2NT module. 

The structure reported here for the entire E2 transactivation module, has several 
implications for understanding of E2 function. It is now possible to map known 
mutations onto the E2 three-dimensional structure, and to use the knowledge of 
amino acid conservation and the effects of mutations to assign roles in folding, 
structure and function to residues. To this end, our results indicate that molecular 
surfaces involved in transactivation and El -binding are located at opposite sides of 
the Nl domain of E2NT, suggesting that both surfaces could be accessed 
simultaneously by other protein factors. In line with these observations, El has been 
shown to modulate transactivation by directly interacting with E2, leading to 
repression of transactivation in the presence of excess EV^. It is not inconceivable 
that the docking of E2NT dimer wqth El is sufficient to block further association 




The structure shows that the transactivation surface is involved in the formation of 
the E2NT dimer, which could cross-link E2 molecules bound by their E2CT modules 
to well-separated DNA sites. Inevitably, such dimerisation would cause DNA to 
form a loop structure, targeting distally bound transcription factors to regions close to 
5 the promoter. While this process has been suggested to be essential for 

transactivation'*^, the definition of interacting surfaces between E2 and other cellular 
transcription factors requires a great deal of further study. 

Our results suggest that the process of DNA loop formation could involve swapping 
10 of E2NT modules across E2 dimers bound at separated DNA sites (Fig. 3a-d). The 

polar components of the monomer-monomer interactions may favour such exchange. 

Domain swapping is a well-recognised phenomenon that occurs relatively frequently 

between two individual monomers containing domains connected by a flexible linker 
E2 is to our knowledge the first example where the swapping event is predicted 
1 5 to occur between dimers. 

The dimerisation surface of E2 represents a good target for designing anti-viral drugs, 
since it is essential for viral transcription, there is no homologous human protein and 
the residues forming the interface are highly conserved among different viral strains. 

20 Dynamic interactions between transcription factors play a central role in the 

regulation of transcription and replication. Dimerisation, heterodimerisation and the 
monomer-to-dimer transition may play important roles during the control of the 
papillomavirus life cycle. These processes themselves can be regulated through 
phosphorylation, proteolysis, interaction with small ligands or changes in their 

25 intracellular concentration. It has been suggested that E2 can regulate the switch 

between early gene expression and viral genome replication during HPV infection"* ^ 
It is possible that dimerisation of E2NT modules plays an essential role during this 
process. One scenario would be to activate transcription via induction of DNA loop 
formation at early stages of the viral life cycle. At later stages, when the 

30 concentration of expressed E2 proteins within the cell becomes high and comparable 
with the ATd for E2 dimer formation, free E2NT modules could compete for 
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dimerisation with those involved in DNA loop formation and titrate them away, 
switching off transcription and stimulating replication. It is also possible that other 
protein factors could be involved in this process, including, for example. El . Future 
studies on E2 interactions with viral and cellular transcription factors will reveal the 
5 exact structural events involved in transactivation and replication of 

papillomaviruses. Our results provide an essential step towards understanding these 
mechanisms. 

Materials and Methods 

Purification and crystallisation. 

10 Details of the purification and crystallisation of E2NT have been described 
previously'^. Briefly, the ORF encoding the N-terminal 201 residues of HPV-16 E2 
was cloned into the prokaryotic expression plasmid pET15b downstream of the 20- 
residue His-tag leader sequence; protein was expressed in E. co//BL21(DE3 )pLysS 
and purified using nickel affinity and anion exchange chromatography. Crystals were 

15 obtained by hanging drop vapour diffusion with 0.8-1.2M ammonium sulphate, 0.1 M 
triethanolamine pH 8.0-8.3 and 3-5% 2-methyl-2,4-pentanedioL Crystals grew only 
with very fresh protein preparations and deteriorated in terms of diffraction quality in 
less than a week. This necessitated freezing and storage of crystals in liquid nitrogen 
immediately after growth, as discussed above. 

20 Structure determination. 

All data were recorded on cr\*oeenically frozen cr\'stals. A native cr\'stal was frozen 
for which initial data were recorded to 3.4 A'^. For the screening of derivatives, 
crystal stability w^as even more limiting. Nine crystals were soaked in various heavy 
atom reagents immediately after growth. The crvstals were screened in-house using a 
25 MAR research imaging plate on a Rigaku RU200 rotating anode source, by recording 

01 15-2U'\j alter scaling using SCAL1\PACK*~ and were stored in liquid nitrogen. 
The native cr>stal was transported to EMBL Hamburg where 1 .9 A data were 




measured using synchrotron radiation from beam line XI 1, Table 1. In addition data 
were recorded at EMBL for the three promising derivatives to about 2.7 A. Two of 
these derivatives proved useful in phase determination and the structure was solved 
by multiple isomorphous replacement with anomalous scattering (MIRAS) at 2.7 A. 
5 The two derivatives were solved independently using the CCP4 suite'^'^from the 
difference Patterson synthesis and by direct methods as implemented in SHELX"*"*. 
Both contained a single heavy atom site. Phases, calculated using MLPHARE, were 
enhanced by solvent flattening4- using a solvent content of 50 %. The resulting high 
qualit\' density map was easily interpretable and the initial model was built using 

10 QUANTA (Molecular Simulations) for all but four residues of the construct, ignoring 
the His-tag. The model was completed with RJEFMAC (resolution 20-1 .9 A) using a 
bulk solvent correction, to an R- factor of 23.3 % (Rpree 29.7 % - for 5 % of the data). 
There are 221 residues in the recombinant protein: the first twenty comprise the His- 
Tag. The final model contains all but two of the 201 residues of the real protein: 

15 residues 125-126 are disordered and lie in a flexible surface loop. Only one residue, 
HisO, of the His-tag has clear density and an ordered conformation. In addition there 
are 1 87 water molecules, which were selected using ARP'^^during the course of 
refinement. The main statistics of the refined model are shovm in Table 2. 

20 Analytical ultracentrifugation. 

Experiments were carried out in an Optima XL-A ultracentrifuge (Beckman-Coultier, 
CA, USA) using scanning UV optics. During the experiments, the recombinant 
E2NT was in lOmM TrisHCl pH 8.0, 5mM DTT, 0.2 mM EDTA, 300 mM NaCl. 
25 Data were obtained at rotor speeds of 12,000 and 16,000 rpm, and the time to 

equilibrium was 10-12 hours. All runs were carried out at 293K, and all radial scans 
were at a wavelength of 280 nm. Dissociation constants were obtained by nonlinear 
regression using the Beckman ultracentrifuge software. 
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X-ray data and phasing statistics 



Data set 


Native 


UAc 


AuCN 


Space Group 


P3i21 


P3i21 


P3i21 


a ,b (A) 


54.68 


54.49 


54.58 


c (A) 


155.73 


155.66 


156.50 


Resolution (A) 


30-1.9 


20-2.7 


20 - 2.7 


Temperature, K 


120 


120 


120 


Wavelength (A ) 


0.86 


0.86 


0.86 


Unique reflections 


21751 


7873 


7937 


Completeness (%) 
(outer shell ) 


98.8 (89.3) 


99.8 (96.1) 


99.7(93.8) 


R-merge (outer shell) 


0.058 (0.339) 


0.073 (0.271) 


0.061 (0.268) 


Phasing Power: (centric / acentric) 


1.55/2.07 


0.95/1.40 


FOM: MIRAS 


0.59 


FOM: DM 20-2.7 A (2.7 - 1.9 A) 


0.88 (0.61) 


DM: Mean phase change (20-2.7 A) 


32 ° 


R-factor (FreeR) 


0.223 (0.295) 
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Table 2 

Refinement and model correlation 

Resolution 1.9-1 0.0 A 

Number of protein atoms 1 62^ 

5 Number of solvent sites 21 1 

Number of reflections used in refinement 20637 

' Numbef"orfeTIecHons useaTbr'Tirree calculation Tl'l T 

R-factor ^ 0.232 

Rfr^e ^ 0.305 

10 Average atomic B-factor*, A2 protein atoms 38.0 

water molecules 48.5 
R.m.s. deviations from ideal geometry (A). Targets in parentheses 

bond distance 0.013 (0.020) 

angle distance 0.026 (0.040 ) 

chiral volume 0.142(0.200) 



Crystallographic R-factor, R(free) = □ l|Fo| - |Fc|| / □ |Fo| 
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HPV 16 E2 Protein: Functicna! assignments 
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Two E2 dimers bound to 
separate sites: 
Dimers held together by C 
terminal interactions 
Homotetramer held together 
by N terminal dimerisatlon. 
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